Queues: benchmark implementations against each other #2276

polybeandip · 2024-09-03T15:07:44Z

Closes #2221

Changes:

implement strict and round_robin scheduling transactions via stable_binheap
use generate_name in ComponentBuilder.case; this way, there are less conflicts when a component uses multiple cases
- change the associated case test
tweak gen_test_data.sh to copy round_robin and strict data files into tests/binheap/
make clean_test_data.sh purge data files in tests/binheap/round_robin and tests/binheap/strict
make runt.toml test our new queues
setup cycles.sh to generate cycles counts for all tests
setup resources.sh to generate synthesis results for all tests

- Setup flit install - Tweak action to install queues pkg - Run test suite with runt

- alter CI to run gen script before runt tests - keep tests/binheap/binheap_test.{data, expect} (they're small)

- seperate pkg install and data gen into different jobs

- fix SDN tests - make binheap tests run

- also tweak calyx-py case test

anshumanmohan · 2024-09-11T13:53:33Z

Hi @polybeandip, thanks for getting this going! Since you are knee deep in this stuff, I would like to tack on a quick request that should be easy to handle. @parthsarkar17 is interested in adding some queue-ey stuff to his benchmarking suite that optimize FSMs and such. Could you please pass him designs for one or two of our more complicated tree schedulers? For each design, he'll need a .futil file, a .data file, and a .expect file.

Some candidates that come to mind:

Cassandra did up a "complex tree" that uses only rr and strict but has height of 2 or 3.
Some of your newest implementations of schedulers using stable binary heaps.

- setup script for generating cycles counts: cycles.sh - report cycles counts in cycles.txt - tweak clean_test_data.sh to remove binheap/round_robin and binheap/strict tests

polybeandip · 2024-09-11T15:11:26Z

UPDATE on our benchmarks

cycles.sh generates cycle counts for all our designs. Remember to run gen_test_data.sh first!

For now, I've also reported this data in cycles.txt and below:

binheap/binheap_test.py: 750
binheap/fifo_test.py: 1509164
binheap/pifo_test.py: 1784719
binheap/round_robin/rr_2flow_test.py: 1870740
binheap/round_robin/rr_3flow_test.py: 1884450
binheap/round_robin/rr_4flow_test.py: 1897807
binheap/round_robin/rr_5flow_test.py: 1903303
binheap/round_robin/rr_6flow_test.py: 1934811
binheap/round_robin/rr_7flow_test.py: 1944544
binheap/stable_binheap_test.py: 1802173
binheap/strict/strict_2flow_test.py: 1785391
binheap/strict/strict_3flow_test.py: 1826179
binheap/strict/strict_4flow_test.py: 1842823
binheap/strict/strict_5flow_test.py: 1852314
binheap/strict/strict_6flow_test.py: 1851588
complex_tree_test.py: 1504412
fifo_test.py: 595422
pifo_tree_test.py: 1199525
round_robin/rr_2flow_test.py: 993313
round_robin/rr_3flow_test.py: 1013489
round_robin/rr_4flow_test.py: 1032869
round_robin/rr_5flow_test.py: 1051211
round_robin/rr_6flow_test.py: 1125919
round_robin/rr_7flow_test.py: 1136933
sdn_test.py: 1244054
strict/strict_2flow_test.py: 1058077
strict/strict_3flow_test.py: 1144637
strict/strict_4flow_test.py: 1233029
strict/strict_5flow_test.py: 1315433
strict/strict_6flow_test.py: 1481437

It looks like @csziklai's PIFOs do about about 300,000 to 1,000,000 cycles better than the binary heaps!

anshumanmohan · 2024-09-11T15:13:16Z

!! Please tell the channel!

rachitnigam · 2024-09-11T15:56:56Z

It looks like @csziklai's PIFOs do about about 300,000 to 1,000,000 cycles better than the binary heaps!

This sounds awesome but I would caution against celebrating the numbers too much without synthesis results. Something that is takes 1,000,000 cycles at 100MHz is still way faster than something that take 100,000 cycles at 1MHz.

polybeandip · 2024-09-11T16:02:03Z

Agreed! Synthesis results are coming up shortly

polybeandip · 2024-09-12T20:44:55Z

UPDATE on our synthesis results

resources.sh generates "executive summaries" of the synthesis reports for all our designs: i.e. the output of

fud e --to resource-estimate <filename>.futil

for all our queues. Notably, this means our results (resources.txt) are missing certain stats (for example, BRAMs) and use the default .xdc file: i.e. a clock period of 7 ns. Per @sampsyo's suggestion, this frequency should be okay for now.

@anshumanmohan I think it would be odd to commit counts.txt and resources.txt — I've only added them to show folks reviewing this PR the results. As for where they should live, I'm thinking we make a discussion thread in the packet scheduling repo and park it there. Perhaps that thread could be a running log of synthesis results and cycles counts as we continue tweaking our designs?

polybeandip · 2024-09-13T17:44:35Z

Pretty pictures!

rachitnigam · 2024-09-13T17:50:06Z

Notably, this means our results (resources.txt) are missing certain stats (for example, BRAMs) and use the default .xdc file: i.e. a clock period of 7 ns. Per @sampsyo's suggestion, this frequency should be okay for now.

Thanks for running synthesis @polybeandip! The frequency number you want to report is going to be determined using "worst_slack". You can calculate the best frequency using best_freq = 1000/(7 - worst_slack) MHz. In general, you want compare the product cycle_count * best_freq to determine which design is better.

polybeandip · 2024-09-13T17:53:18Z

Ah, perfect! I'll crunch those numbers soon

anshumanmohan · 2024-09-13T19:10:50Z

As for where they should live, I'm thinking we make a discussion thread in the packet scheduling repo and park it there. Perhaps that thread could be a running log of synthesis results and cycles counts as we continue tweaking our designs?

Agreed, thank you!

polybeandip · 2024-09-13T21:06:29Z

More graphs, this time using @rachitnigam's suggestion to compute total time spent on our workload. This way, we account for difference in clock speed.

anshumanmohan · 2024-09-17T15:38:57Z

@polybeandip so @ayakayorihiro has the same request as Parth! Would you mind sending her the same things?

- move to discussion cucapra/packet-scheduling#60

anshumanmohan

Very cool stuff, and thanks for leading our foray into real benchmarking! I have a couple notes, one of which I will seriously request, but otherwise I have reviewed to learn!

calyx-py/calyx/builder.py

frontends/queues/cycles.sh

frontends/queues/queues/binheap/flow_inference.py

frontends/queues/queues/binheap/round_robin.py

frontends/queues/queues/binheap/strict.py

frontends/queues/queues/strict_or_rr.py

frontends/queues/resources.txt

anshumanmohan · 2024-09-25T02:20:01Z

Terrific! So this is ready to merge, once you refactor the underlying issue and link this PR to the underlying issue. Go for it, no need for another review from me

polybeandip · 2024-09-25T02:23:25Z

Okay! I'm still a little unsure if I have the green light to make the tweak I did to case statements in Calyx-Py. Did you have a chance to look at this comment?

anshumanmohan · 2024-09-25T02:26:07Z

Yeah! I replied just below. The comment starting with “I understand now. For completeness…”. Green light!

polybeandip added 13 commits August 27, 2024 13:03

Sketching

c1ccba3

Setup data gen script

e4998ab

Rename

895117f

runt, flit, & github actions

21e1d68

- Setup flit install - Tweak action to install queues pkg - Run test suite with runt

Remove large .data and .expect files

f44c38f

- alter CI to run gen script before runt tests - keep tests/binheap/binheap_test.{data, expect} (they're small)

Tweak rust.yml

331cd5e

- seperate pkg install and data gen into different jobs

SDN + binheap

48cd327

- fix SDN tests - make binheap tests run

Fill in README

8221fe3

Fix docs links

222a495

Implement binheap round robin queues

4a40a74

Factor out flow inference

da2b2c5

Implement binheap strict queues

bd630e2

- also tweak calyx-py case test

Merge branch 'main' into queue-benchmarks

9defd7d

Add cycle counts

999895a

- setup script for generating cycles counts: cycles.sh - report cycles counts in cycles.txt - tweak clean_test_data.sh to remove binheap/round_robin and binheap/strict tests

polybeandip force-pushed the queue-benchmarks branch from 3ad5253 to 999895a Compare September 11, 2024 15:01

Add synthesis results

2bd6dbe

polybeandip marked this pull request as ready for review September 12, 2024 20:45

Add cycle plots

edcb4f8

polybeandip force-pushed the queue-benchmarks branch from 04f8421 to edcb4f8 Compare September 13, 2024 17:36

Tweak plot gen

8bfafe5

polybeandip added 2 commits September 13, 2024 15:54

Clean up cycles.sh and resources.sh

8a61977

Add total time plot

8752c15

polybeandip requested a review from anshumanmohan September 17, 2024 16:32

polybeandip force-pushed the queue-benchmarks branch from a1d5b7a to ac1b3e9 Compare September 24, 2024 18:16

Clean up benchmark data

d4163d8

- move to discussion cucapra/packet-scheduling#60

polybeandip force-pushed the queue-benchmarks branch from ac1b3e9 to d4163d8 Compare September 24, 2024 18:17

anshumanmohan requested changes Sep 24, 2024

View reviewed changes

polybeandip force-pushed the queue-benchmarks branch from b4f6c03 to 5a0c162 Compare September 24, 2024 19:51

Factor out binheap/round_robin.py state updates

5406ab6

polybeandip force-pushed the queue-benchmarks branch from 5a0c162 to 5406ab6 Compare September 24, 2024 19:54

anshumanmohan mentioned this pull request Sep 24, 2024

Queues: benchmark specialized queues against binary heap variants #2221

Closed

Make cycle.sh and resources.sh generate JSONs

5eff826

polybeandip merged commit f52cc6d into main Sep 25, 2024
18 checks passed

polybeandip deleted the queue-benchmarks branch September 25, 2024 02:49

polybeandip mentioned this pull request Sep 25, 2024

Queues: benchmark PIEO queues against all others #2289

Open

polybeandip mentioned this pull request Oct 3, 2024

Tracker for realistic queue benchmarking harness cucapra/packet-scheduling#65

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queues: benchmark implementations against each other #2276

Queues: benchmark implementations against each other #2276

polybeandip commented Sep 3, 2024 •

edited

Loading

anshumanmohan commented Sep 11, 2024

polybeandip commented Sep 11, 2024 •

edited

Loading

anshumanmohan commented Sep 11, 2024

rachitnigam commented Sep 11, 2024

polybeandip commented Sep 11, 2024

polybeandip commented Sep 12, 2024 •

edited

Loading

polybeandip commented Sep 13, 2024 •

edited

Loading

rachitnigam commented Sep 13, 2024 •

edited

Loading

polybeandip commented Sep 13, 2024

anshumanmohan commented Sep 13, 2024

polybeandip commented Sep 13, 2024

anshumanmohan commented Sep 17, 2024

anshumanmohan left a comment

anshumanmohan commented Sep 25, 2024

polybeandip commented Sep 25, 2024

anshumanmohan commented Sep 25, 2024 •

edited

Loading

Queues: benchmark implementations against each other #2276

Queues: benchmark implementations against each other #2276

Conversation

polybeandip commented Sep 3, 2024 • edited Loading

anshumanmohan commented Sep 11, 2024

polybeandip commented Sep 11, 2024 • edited Loading

anshumanmohan commented Sep 11, 2024

rachitnigam commented Sep 11, 2024

polybeandip commented Sep 11, 2024

polybeandip commented Sep 12, 2024 • edited Loading

polybeandip commented Sep 13, 2024 • edited Loading

rachitnigam commented Sep 13, 2024 • edited Loading

polybeandip commented Sep 13, 2024

anshumanmohan commented Sep 13, 2024

polybeandip commented Sep 13, 2024

anshumanmohan commented Sep 17, 2024

anshumanmohan left a comment

Choose a reason for hiding this comment

anshumanmohan commented Sep 25, 2024

polybeandip commented Sep 25, 2024

anshumanmohan commented Sep 25, 2024 • edited Loading

polybeandip commented Sep 3, 2024 •

edited

Loading

polybeandip commented Sep 11, 2024 •

edited

Loading

polybeandip commented Sep 12, 2024 •

edited

Loading

polybeandip commented Sep 13, 2024 •

edited

Loading

rachitnigam commented Sep 13, 2024 •

edited

Loading

anshumanmohan commented Sep 25, 2024 •

edited

Loading